Unifying Single-Linkage and Max-Sum Clustering
نویسندگان
چکیده
To unify the motivating principles behind Max-Sum and Single-Linkage clustering, we consider an axiomatic approach to the theory of Clustering. We discuss abstract properties of clustering functions, following the framework of Kleinberg, [7]. By relaxing one of Kleinberg’s clustering axioms, we sidestep his impossibility result and arrive at a consistent set of axioms. We suggest to extend these axioms, aiming to provide an axiomatic taxonomy of clustering paradigms. Such a taxonomy should provide users some guidance concerning the choice of the appropriate clustering paradigm for a given task. The main result of this paper is a set of abstract properties that characterize the Max-Sum and Single-Linkage clustering functions. These functions have been traditionally treated separately, as the principles motivating their use have never been unified. In the current paper we provide alternative axiomatic definitions for Max-Sum and Single-Linkage which sheds light on their common underlying principles and will guide the user to decide which is appropriate for a particular task, if either of them.
منابع مشابه
Characterizing Properties for Q-Clustering
We uniquely characterize two members of the Q-Clustering family in an axiomatic framework. We introduce properties that use known tree constructions for the purpose of characterization. To characterize the Max-Sum clustering algorithm, we use the Gomory-Hu construction, and to characterize Single-Linkage, we use the Maximum Spanning Tree. Although at first glance it seems these properties are ‘...
متن کاملTowards a Principled Theory of Clustering
To answer the question “Which clustering function should one use?” for a given task, we consider an axiomatic approach to the theory of Clustering, with special focus on uniqueness theorems characterizing popular clustering functions. We argue that such theorems can be used to decide exactly when a particular clustering function should be used or avoided. We discuss abstract properties of clust...
متن کاملChoosing the Best Hierarchical Clustering Technique Based on Principal Components Analysis for Suspended Sediment Load Estimation
1- INTRODUCTION The assessment of watershed sediment load is necessary for controling soil erosion and reducing the potential of sediment production. Different estimates of sediment amounts along with the lack of long-term measurements limits the accessibility to reliable data series of erosion rate and sediment yield. Therefore, the observed data of suspended sediment load could be used to ...
متن کاملDevelopment of An External Cluster Validity Index using Probabilistic Approach and Min-max Distance
Validating a given clustering result is a very challenging task in real world. So for this purpose, several cluster validity indices have been developed in the literature. Cluster validity indices are divided into two main categories: external and internal. External cluster validity indices rely on some supervised information available and internal validity indices utilize the intrinsic structu...
متن کاملAn Impossibility Theorem for Clustering
Although the study of clustering is centered around an intuitively compelling goal, it has been very difficult to develop a unified framework for reasoning about it at a technical level, and profoundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: f...
متن کامل